Microsporidian genome analysis reveals evolutionary 
strategies for obligate intracellular growth 
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Microsporidia comprise a large phylum of obligate intracellular eukaryotes that are fungal-related parasites responsible for 
widespread disease, and here we address questions alx)ut microsporidia biology and evolution. We sequenced three 
microsporidian genomes from two species, Nematocida parisii and Nematocida spl, which are natural pathogens of Caenorbabditis 
nematodes and provide model systems for studying microsporidian pathogenesis. We performed deep sequencing of 
transcripts from a time course of N. parisii infection. Examination of pathogen gene expression revealed compact transcripts 
and a dramatic takeover of host cells by Nematocida. We also performed phylogenomic analyses of Nematocida and other 
microsporidian genomes to refine microsporidian phylogeny and identify evolutionary events of gene loss, acquisition, 
and modification. In particular, we found that all microsporidia lost the tumor-suppressor gene retinoblastoma, which we 
speculate could accelerate the parasite cell cycle and increase the mutation rate. We also found that microsporidia acquired 
transporters that could import nucleosides to fuel rapid growth. In addition, microsporidian hexokinases gained secretion 
signal sequences, and in a functional assay these were sufficient to export proteins out of the cell; thus hexokinase may be 
targeted into the host cell to reprogram it toward biosynthesis. Similar molecular changes appear during formation of 
cancer cells and may be evolutionary strategies adopted independently by microsporidia to proliferate rapidly within host 
cells. Finally, analysis of genome polymorphisms revealed evidence for a sexual cycle that may provide genetic diversity to 
alleviate problems caused by clonal growth. Together these events may explain the emergence and success of these diverse 
intracellular parasites. 



[Supplemental material is available for this article.] 

There are countless species of obligate intracellular microbes, 
which by definition are completely dependent on intracellular 
resources from their hosts. These microbes include pathogens as 
well as symbionts, which are thought to be the ancestors of in- 
tracellular organelles like mitochondria (Keeling 2011). The strat- 
egies used by obligate microbes to thrive within host cells remain 
enigmatic, despite obligate intracellular pathogens being re- 
sponsible for significant medical and agricultural problems. For 
example, —10% of college students in the United States are in- 
fected with the poorly understood pathogen Chlamydia tracho- 
matis, which is an obligate intracellular bacterium and a leading 
cause of genital infections and blindness (James et al. 2008). In 
addition to prokaryotic pathogens, eukaryotic obligate intracel- 
lular pathogens also cause severe morbidity and mortality in a va- 
riety of hosts. For example, Microsporidia comprise a phylum with 
over 150 genera containing more than 1200 species of parasites 
that can infect and kill animals from virtually every phylum, with 
few treatments available (Keeling and Fast 2002; Williams 2009). 
Specifically, microsporidia cause opportunistic infections in AIDS 
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patients, as well as organ transplant recipients, malnourished 
children, and the elderly (Didier and Weiss 2011). Microsporidia 
also cause agricultural disease: 50%-94% of honey bees in the 
United States are infected with the microsporidian Nosema ceranae, 
which has been implicated in honey-bee colony collapse disorder 
Oohnson et al. 2009; Troemel 2011). 

Microsporidian genomes encode a greatly reduced metabolic 
potential compared with other eukaryotes (Keeling and Corradi 
2011), yet these microbes can still undergo dramatic proliferation 
within host cells. This rapid growth likely places a substantial 
metabolic burden on the host to generate building blocks such 
as nucleotides, amino acids, and lipids, which are needed by the 
replicating parasite cells. How microsporidia or other obligate in- 
tracellular microbes direct the host to make these biosynthetic 
factors and how they gain preferential access to them is not well 
understood. 

Although Microsporidia appear to be most closely related to 
Fungi, it is debated whether they should be included within the 
Fungal Kingdom or excluded from it Qames et al. 2006). A major 
challenge in phylogenetic assignment of the Microsporidia is that 
they have highly divergent sequences (Stiller and Hall 1999; 
Katinka et al. 2001), which could be due to rapid proliferation and/ 
or a high mutation rate. Previous phylogenetic analyses have relied 
on comparisons of only one or a small number of genes and have 
suggested multiple placements of Microsporidia relative to Fungi 
(Ediind et al. 1996; Hirt et al. 1999; Keeling et al. 2000). Comparisons 
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Table 1 . Statistics of Nematocida genomes 



of many shared genes among Microsporidia 
and Fungi coiild help resolve these contro- 
versies regarding microsporidia phylogeny. 

Although all species of Microsporidia 
are thought to share general features in 
their life cycle such as invasion using a 
polar tube, intracellular replication as 
meronts, and differentiation into spores, 
the details of their life cycle can be com- 
plex, and they vary from species to species. 
On a broader scale, it is still unclear 
whether microsporidia are haploid or 
diploid, and whether they undergo a 
mating and meiosis cycle. Previously se- 
quenced microsporidian genomes are 
presumed to be haploid. However, some 
species of microsporidia are thought to 

undergo a sexual cycle (Becnel et al. 2005), and recent studies have 
indicated that the gene order of the zygomycete mating locus is 
conserved in the microsporidia (Lee et al. 2008). This observation 
suggests that microsporidia could have a diploid stage and a mat- 
ing cycle, and may be true Fungi. Answering questions of ploidy, as 
well as questions about mating and recombination, could provide 
insight into microsporidian speciation and pathogenic mecha- 
nisms, since mating appears to be important for the generation of 
diversity and virulence in other pathogenic fungi (Ni et al. 2011). 

Recently, we discovered natural parasites of the nematodes 
Caenorhabditis elegans and Caenorhabditis briggsae and showed that 
they define a new microsporidian genus (Nematocida) that con- 
tains two species (parisii and spl) (Troemel et al. 2008). Because 
C. elegans is a well-understood model organism, the N. parisii- 
C. elegans system provides a powerful model in which to decipher 
microsporidian biology. Microsporidia, like most obligate intra- 
cellular pathogens, are refractory to genetic analysis, and thus ge- 
nome analysis provides the best current alternative to investigate 
questions of evolution and pathogenesis of the microsporidia. 
Here we sequence the genomes of three Nematocida strains and 
perform deep-sequencing expression analysis diuing distinct 
stages of infection. We compare these genomes to several other 
divergent microsporidia genomes, as well as other eukaryotic ge- 
nomes, to better understand the evolutionary strategies that led 
to the emergence of this large phylum of obligate intracellular 
pathogens. 

Results 

Nematocida genome assembly and content 

We sequenced and assembled the genomes of three Nematocida 
strains (Supplemental Fig. SI): two strains of N. parisii collected 
from wild-caught C. elegans in France (ERTml and ERTmS) and one 
of a divergent species termed Nematocida spl collected from wild- 
caught C. briggsae in India (ERTm2) (Troemel et al. 2008). Each 
strain was propagated in the standard N2 laboratory strain of 
C. elegans. The assemblies of the two N. parisii strains are very similar 
in size, each totaling 4.1 Mb (Table 1), and contain —95% of their 
sequence on eight or nine scaffolds of —150 to —900 kb (Supple- 
mental Table S2). Because some scaffolds may be complete chro- 
mosomes, this suggests that there are less than eight chromosomes 
in N. parisii. These genomes are 99.8% identical, and 96% of ERTml 
can be uniquely aligned to ERTm3. Thus, the low divergence of these 
two isolates supports their previous designation as the same species. 
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The genome of Nematocida spl (ERTm2) is highly diverged 
from the N. parisii (ERTml and ERTm3) assemblies. Alignments of 
ERTml and ERTm2 cover only 1 . 7 Mb, with 68.3% average identity 
and 80.2% average similarity in protein-based alignments. Despite 
this divergence, these genomes appear highly syntenic with 
a small number of intrascaffold inversions (Supplemental Fig. SI). 
The Nematocida spl (ERTm2) assembly is 4.5 Mb, 14% larger than 
the assemblies of the two N. parisii strains. This expansion can be 
explciined in part by a larger amount of repetitive sequence (2.1- 
fold more in Nematocida spl) (Table 1; Supplemental Table S2), 
some of which share sequence similarity with transposable ele- 
ments (see Supplemental Methods). 

RNA-seq analysis of gene expression 

To determine the transcribed regions of the N. parisii (ERTml) 
genome and define gene expression during its infection cycle, we 
performed strand-specific deep sequencing of RNA (RNA-seq) iso- 
lated at distinct stages of N. parisii infection in C. elegans. N. parisii, 
like other microsporidia, undergoes a complex infection cycle, 
with distinct stages of parasite morphology (Fig. lA). To produce 
a synchronous infection for RNA-seq analysis, we inoculated 
C. elegans with a large dose of N. parisii spores and quantified in- 
fection progression by DIG microscopy (Fig. 1B,D). We also visu- 
alized infection with fluorescence in situ hybridization (FISH) 
staining, which illustrates the rapid proliferation of N. parisii 
throughout the infection cycle within the host (Fig. IC). Based on 
these results, we isolated RNA from C. elegans at five time points (8, 
16, 30, 40, and 64 h post-inoculation [hpi] with N. parisii) (Fig. ID) 
to capture gene expression at distinct stages of infection. The early 
time-point samples (8, 16, and 30 hpi) contain only pathogens in 
the replicative meront stage, while later time-point samples (40 
and 64 hpi) contain a mixture of pathogens in the meront, sporont, 
and mature transmissible spore stages. 

In addition to microscopy, RNA-seq and qRT-PCR analysis of 
a mixed pathogen and host RNA sample illustrates how rapidly this 
pathogen replicates during infection. The abundance of N. parisii 
reads comprised only a small fraction of the sample at early time 
points but represented an increasing percentage of RNA as in- 
fection progressed, a finding that was confirmed with qRT-PCR 
(Supplemental Table S4). At 8 hpi, only 0.26% of the total RNA-seq 
reads corresponded to the N. parisii genome, and this fraction in- 
creased until 40 hpi, when —28% of the reads corresponded to 
N. parisii (Fig. 1 D; Supplemental Table S4), despite the pathogen being 
restricted to the worm intestine. Based on the increase in number 
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Figure 1. Cfiaracterization of N. parisii infection stages in C. eiegans. (A) Diagram of infection stages. 
N. parisii infects C elegans intestinal cells (green), where it grows intracellularly from small mono- 
nucleate sporoplasms to large, multinucleate meronts. Meronts develop into spores, wfiich at late stages 
of infection can be enclosed within vesicles. (6) Representative DIC images of animals are shown dis- 
playing distinct infection-associated phenotypes with boxed areas enlarged and shown as insets. (Ar- 
row) Small spore; (arrowhead) large spore. (C) N. por/s//-specific FISH (red) and DAPI (blue) staining of 
animals at different stages of infection. Note: Although animals do not exhibit infection symptoms by 
DIC at 8 hpi, sporoplasm is visible by fluorescent in situ hybridization (FISH) at 8 hpi. (B,Q Scale bar, 
10 (j.m. (D) The /-axis on the left indicates the fraction of animals in a population exhibiting specific 
symptoms of infection visualized by DIC {n > 22 animals assayed per time point) (see Supplemental 
Methods). RNA sample collection times are indicated on the graph: (1)8 hpi, during the sporoplasm 
stage as observed by FISH; (2) 1 6 hpi, during the early meront stage; (3) 30 hpi, during the late meront 
stage; (4)40 hpi, when spores have just started forming; and (5) 64 hpi, when Nematoc/da spores can be 
found within membrane-bound vesicles. The y-axis on the right indicates the contribution of N. parisii 
reads to total reads from RNA-seq analysis. 



of RNA-seq reads, we estimated the doubling time of N. parisii to be 
3.3 h during the replicative meront stage of infection (between 16 
and 30 hpi), and based on qRT-PCR to be 2.9 h (see Supplemental 
Methods). These estimates are close to the 2.9-h doubling time of 
Schizosaccharomyces pombe in rich culture media (Fantes 1977), 
suggesting a finely tuned adaptation of this pathogen to intra- 
cellular growth conditions. 



We predicted 2661 genes in the 
N. parisii (ERTml) genome, 2546 of which 
had RNA-seq-based evidence for expres- 
sion during infection (Table 1). These 
numbers are similar in magnitude to the 
gene count from other microsporidia 
genomes, e.g., Encephalitozoon cuniculi 
(1996) (Katinka et al. 2001), Encephalito- 
zoon intestinalis (1833) (Corradi et al. 2010), 
and N. ceranae (2060) (Cornman et al. 
2009). Forty-three percent of N. parisii 
proteins can be assigned a putative func- 
tion from Pfam domains, and 20 of the 
43% also could be assigned Gene Ontol- 
ogy (GO) terms (Table 1). This is less than 
other microsporidia; using the same 
methods, 66% of E. cuniculi genes could 
be assigned Pfam domains, and 52% 
could be assigned both Pfam domains 
and GO terms. The large number of 
Nematocida-spediic genes (see below) 
could account for some of this difference. 
We also identified 198 proteins (7.5% of 
the proteome) predicted to be secreted 
in ERTml (Table 1), which is similar to 
predictions for£. cuniculi (5.2%) and the 
ascomycete fungi S. pombe (4.5%) and 
Aspergillus nidulans (8.6%). 

No introns were predicted in protein- 
coding genes in any of the three genomes. 
Based on the RNA-seq data, we also did not 
observe any spliced mRNA transcripts in 
these genomes. Numerous RNA compo- 
nents of the spliceosome machinery were 
missing (e.g., U2 and U6 snRNAs, U2AF 
splicing factor) as well as the SRPK and 
PF(P4 kinases that regulate splicing activ- 
ity, further supporting the predicted lack 
of any spliced transcripts in Nematocida. 
Splicing machinery also appeared lost in 
the Enterocytozoon bieneusi and Vittaforma 
corneae clades, an absence previously noted 
in E. bieneusi (Akiyoshi et al. 2009). The 
remaining microsporidia in the phyloge- 
netic analysis (see below) appear to have 
intact splicing machinery, suggesting mul- 
tiple independent losses of splicing within 
microsporidia. 

The N. parisii (ERTml) genome is 
compact, because 72.8% of the sequence 
is predicted to be coding, with a mean 
distance between coding sequences of 
418 bp. To determine the relationship of 
these short intergenic distances to tran- 
script structure, we used the RNA-seq data 
to predict untranslated regions (UTRs) in ERTml and produce the 
first large-scale annotation of microsporidian UTRs. Most genes are 
encoded by discrete transcripts with either unusually short UTRs or 
no UTRs at all (Supplemental Fig. S2; Supplemental Material), in 
contrast to the large multigenic transcripts previously identified in 
£. cuniculi (Williams et al. 2005; Gill et al. 2010). Calculation of 
RNA-seq read coverage across five feature types (coding, 5' UTR, 3' 
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UTR, antisense, and intergenic) showed that antlsense transcrip- 
tion is approximately equivalent to intergenic transcription (Sup- 
plemental Fig. S3). This result suggests that there is little antisense 
transcription in Nematocida despite the close proximity of coding 
sequences. The small size of noncoding intergenic regions likely 
constrains the size of UTRs in N. parisii to avoid overlap with coding 
sequences. Within intergenic regions we identified five candidate 
promoter elements, including a TATA box and a modified version 
of the CCC motif previously described in N. ceranae (Supplemental 
Fig. S4; Supplemental Material; Cornman et al. 2009). 

We also sequenced RNA from purified N. parisii spores and 
identified spore-specific transcripts by comparing these expression 
data to the infection time-course data (Supplemental Table S6). 
N. parisii spore transcripts are predominantly composed of ribosomal 
RNA with very little poly (A) RNA (see Supplemental Material). We 
identified the genes most highly expressed overall in spores (Sup- 
plemental Table S7), which include several transmembrane- 
domain-containing genes. These genes could encode structural 
proteins or could encode receptor proteins that recognize cues that 
trigger firing of the polar tube, the initiating event in microsporidia 
infection. 

Heterozygosity and loss of iieterozygosity in Nematocida 
genomes 

Analysis of the sequence data suggests that Nematocida spl 
(ERTm2) is diploid, highly heterozygous, and has undergone 
loss of heterozygosity (LOH) in regions of some scaffolds. A total 
of 42,175 heterozygous single nucleotide polymorphism (SNP) 



positions were identified in ERTm2 using lUumina sequencing. 
The reference and alternate allele are each supported by roughly 
equal proportions of the reads (the median allele balance at SNP 
positions is 0.56), in support of a diploid model. By examining the 
distribution of SNPs across the genome, we found that most of the 
assembly is heterozygous, but that scaffolds 1, 2, and 4 have each 
undergone LOH as evidenced by a single large homozygous region 
on each scaffold (Fig. 2A). These homozygous regions include only 
0.5% of the SNPs and comprise 21% of the genome, with an av- 
erage of one SNP every 4497 bases. The heterozygous regions In- 
clude 99.5% of the SNPs and comprise 79% of the genome (exam- 
ining scaffolds at least 30 kb in length), with an average of one SNP 
every 82 bases. SNPs are found in coding regions nearly as much as 
expected; 63.1% of SNPs are in coding regions, and coding regions 
cover 64.4% of the ERTm2 sequence. Most (69.2%) of the SNPs in 
coding regions result in synonymous amino acid substitutions. We 
also analyzed the 454 Life Sciences (Roche) sequence reads for 
N. parisii and found that both ERTml and ERTm3 appear to be 
diploid and heterozygous, although there is a lower rate of poly- 
morphism for these strains compared with N. spl ERTm2 (Fig. 2B). 

The heterozygous and LOH regions of Nematocida spl 
(ERTm2) appear stable over several generations of being trans- 
mitted in the C. elegans host. The original ERTm2 DNA sample used 
for sequencing was derived from a pooling of many infected 
samples, which were prepared in parallel from a small number of 
infected donor animals (see Supplemental Methods). Next, we 
serially passaged ERTm2 in C. elegans for 14 wk and then isolated 
genomic DNA for sequencing. The polymorphism pattern in this 
passaged isolate was nearly identical to the original isolate with 
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Figure 2. Distribution of SNPs in Nematoc/do highlights loss of heterozygosity. (/l)Thefrequency of SNPs for sliding windows of 5 kb is plotted across the 
1 1 largest scaffolds of Nematoc/da spl (ERTm2). Values on the x-axis correspond to scaffold position in kilobases; values on the y-axis correspond to the SNP 
frequency (SNPs/kb). The pattern of the initial isolate (red) is nearly identical to the passaged isolate (blue). Large regions exhibiting loss of heterozygosity 
are present in terminal regions of scaffolds 1 , 2, and 4. (6) The frequency of SNPs for sliding windows of 5 kb is plotted across the 1 1 largest scaffolds of 
N. paris/V (ERTml). Values on the x-axis correspond to scaffold position in kilobases; values on the y-axis correspond to the SNP frequency (SNPs/kb). The 
patterns of SNPs for ERTml (blue) and ERTm3 (red) are highly similar. 
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no additional LOH regions (Fig. 2A), Indicating that ERTm2 
was not undergoing rapid LOH during laboratory passage In the 
C. elegans host, and thus recombination Is uncommon under these 
conditions. 

Because all three Nematocida strains appeared to be diploid, 
suggesting that mlcrosporldla may have a mating and a melotlc 
cell cycle, we Identified potential orthologs of characterized mat- 
ing and melosls genes from other fungi (Supplemental Tables S8, 
S9). Of a set of 29 proteins suggested as a core melotlc set (Malik 
et al. 2008) mostly Involved In chromosome cohesion and re- 
combination, 20 orthologs are present In the Nematocida genomes. 
This pattern of conservation is very similar to that found In 
E. ainiculi and N. ceranae; one exception is the presence of RecS 
melotlc centromere cohesln found only In Nematocida species. All 
mlcrosporldla appear to be missing several components (Apcl, 
Apc4, Apc5, and Cdc26) of the anaphase promoting complex. In 
summary, Nematocida genomes encode a few key melosls genes not 
found In other mlcrosporldlan genomes (Supplemental Tables SB, 
S9). Perhaps these melosls genes have been retained In Nematocida 
due to the shorter period of time that these parasites have been 
passaged In a laboratory setting, compared with other mlcro- 
sporldlan species. 



We also identified a Nematocida high-mobility group (HMG) 
domain gene, which is part of a candidate mating-type locus pre- 
viously identified in other mlcrosporldla, based on comparisons to 
zygomycetes (Lee et al. 2008). In each of the three Nematocida 
genomes, there was a single copy of this gene (NEPG_01497/ 
NEQG_01453/NERG_01188). There were no amino acid changes 
in this gene between the two alleles in ERTm2; however, there were 
differences between the gene in ERTm2 and in ERTml/m3. 

Phylogenomics of the Microsporidia and Nematocida 

To better understand genome evolution in Nematocida and the 
Microsporidia overall, we compared Nematocida genomes to those 
of seven additional mlcrosporldla, 13 other fungi, and an out- 
group, the choanoflagellate Monosiga brevicollis. We then estimated 
a maximum likelihood phylogeny based on 53 single-copy core 
orthologs (Fig. 3). Microsporidia were resolved as the earliest 
branching clade within the sequenced fungi. This result is con- 
sistent with a parallel phylogenetlc analysis (Capella-Gutierrez 
et al. 2012) and with the prior report that microsporidia branch 
early within the Fungal Kingdom, and group with the most basally 
known fungus, Rozella allomycis Games et al. 2006). The three 
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Figure 3. Phylogeny and ortholog content of Microsporidia and Fungi. The phylogeny was inferred by nnaxinnum likelihood using RAxML (Stamatakis 
2006) based on the concatenated amino acid sequences of 53 single-copy orthologs shared by all taxa. Bootstrap values above 50% are listed above 
respective nodes. Ortholog counts are shown as bar graphs to the right of the tree and are divided into six categories: core (green, found in all genomes), 
shared (blue, found in at least two genomes excluding other Microsproridia and Nematocida categories that follow), Microsporidia-specific (orange), 
Nemafoc/da-specific (red), N. poris//-specific (purple), and unique (yellow). The presence of E2F, DP, and RB orthologs is indicated in table on the riglit, 
with E2F and DP activators in green, and the RB inhibitor in red. Dikarya cell cycle regulation circuitry is evolutionarily distinct from the RB-E2F pathway 
(Cao etal. 2010). 



2482 Genome Research 



www.genome.org 



Evolutionary strategies of microsporidia 



chytrid fungi (Chytridiomycota and Blastocladiomycota) were re- 
solved in our phylogeny as the next most basal monophyletic 
clade. Based on these data, the hypothesis that Microsporidia 
could be a sister group to Mucoromycotina including Rhizopus, as 
had been previously proposed based on gene order conservation 
(Lee et al. 2010), is rejected (p = 7 x lO""**, AU test) (Shimodaira 
2002). Placement of the Microsporidia as a sister group to the 
chytrids was also rejected (p = 5 X 10 AU test). Within the 
Microsporidia, Nematodda Is the most basal clade in our analysis. 
Most relationships are consistent with previously published 
microsporidian phylogenies (Troemel et al. 2008), although 
Antonospora is positioned less closely to Nematodda, and the po- 
sition of Vavraia is more basal than previously estimated (Troemel 
et cil. 2008). However, previous phylogenetic analysis used Vavraia 
oncoperae for comparison, while Vavraia culids was used here. These 
two species may not form a monophyletic genus, which would 
explain the above discrepancies. 

This phylogeny also highlights the large divergence within 
the Microsporidia, and between Nematodda and other species In 
particular. While N. parisii and 'Nematodda spl orthologs share 62% 
Identity on average, N. parisii and E. cuniculi orthologs share only 
31% identity (Supplemental Fig. S5), suggesting that only highly 
conserved genes will be detected as orthologs given such high 
average divergence. Mapping orthologs onto the phylogenetic tree 
Identified 882 core microsporidian genes (ortholog clusters present 
In eight or more genomes), with 137 of these being specific to 
Microsporidia (Fig. 3). However, the majority of Microsporidla- 
speciflc genes lacked any functional assignment. We therefore al- 
tered our search to find ortholog clusters with a predicted function 
(i.e., Pfam domains or kinase annotations) that were either unique 



to Microsporidia or shared between Microsporidia and other basal 
groups, including the choanoflagellate Monosiga brevicollis and the 
basal fungi Chytridiomycota and Rhizopus oryzae (Table 2; Supple- 
mental Material). 

Analysis of microsporidia-specific Pfam domains identified 
the nucleoside phosphate transporter (Npt) proteins, which have 
been Implicated In nucleotide "stealing" from host cells (Table 2). 
The obligate Intracellular bacterial pathogen Chlamydia tracho- 
matis encodes two such transporters: Nptl specifically transports 
ATP/ADP, while Npt2 transports all four ribonucleotides (Tjaden 
et al. 1999). Four such transporters have previously been described 
in the microsporidian E. cuniculi, with three expressed on the 
plasma membrane to import ATP from the host, and one on the 
mltosomal membrane to Import ATP for Fe/S clustering In this 
genomeless organelle (Goldberg et al. 2008; Tsaousis et al. 2008). 
The four E. cuniculi transporters have a preference for ATP but may 
import other nucleotides as well. The Nematodda genome only 
encodes two Npt transporters. Phylogenetic analysis indicates that 
a single ancestral transporter exhibited organism-specific dupli- 
cations In Nematodda, Antonospora, and Vavraia (Supplemental Fig. 
S6). Further dupllcattons led to greater numbers of these trans- 
porters in different microsporidia species, such as E. cuniculi. We 
speculate that all microsporidian genomes encode at least one 
transporter to import host nucleotides into the parasite to fuel 
parasite growth and DNA replication, and one additional trans- 
porter to Import ATP from parasite cytoplasm Into the mltosome. 
In N. parisii (ERTml) both nucleotide transporters are expressed at 
high levels throughout Infection (Supplemental Table S5). One 
transporter (NEPG_01813) was among the most highly expressed 
genes at 8 hpi, while the other transporter (NEPG_01539) was 



Table 2. Pfam domains and protein l<inases unique to IVIicrosporldIa with respect to other Fungi, or shared between iVIicrosporidia and the 
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Number of genomes with domain 










IVIicrosporidia 


Chytridiomycota 


Mucoromycotina 


Dil<arya 


Monosiga 


Pfam domain/l<inase 




(10) 


(3) 


(1) 


(9) 


(1) 


classification 
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10 


0 


0 


0 
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PF0321 9 Nucleoside phosphate 


Nucleotide transport 












transporter (TLC ATP/ADP 














transporter) 




10 


0 


0 


0 


0 


PF001 82 Chitinase class 1 


Chitin-related 


8 


3 


0 


0 


0 


PF00274 Fructose-bisphosphate 


Glycolysis 












aldolase class 1 




9 


0 


1 


0 
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PF02121 Phosphatidylinositol 


Phospholipid trafficking 












transfer protein 


and signaling 


10 


3 


1 


0 


0 


PF01 1 67 Tub family 


Phospholipid trafficking 














and signaling 


9 


V 


0 


V 


0 


PF03825 Nucleoside H* symporter 


Nucleoside transport 


9 


2 


1 


0 


0 


PF02224 Cytidylate kinase 


Nucleotide metabolism 


10'' 


1 


0 


0 




PF08781 Transcription factor DP 


RB-E2F cell cycle (DP) 


10 


1 


1 


0 




PF02319 E2F/DP family 


RB-E2F cell cycle (E2F) 












winged-helix DNA-blndIng 














donnain 




10 


1 


0 


1*= 




PF01 1 39 Uncharacterized protein 














family 27 




9 


3 


1 


0 




WNK WIth-no-lyslne kinase 


Ion transport regulation. 














osmotic regulation 


9 


3 


1 


1" 




DMPK Dystrophia 


Ion transport regulation. 












myotonlca-protein kinase 


cytoskeletal regulation 



The number of genomes analyzed in each column Is listed In parentheses. 

^Present in nonorthologous copies In Spizellomyces punctatus and Sclerotinia sderotiorum. 

''Present as full-length RNA-seq-based transcript In ERTml but missing from assembly. 

"^Present in Phaeosphaeria nodorurr). 

""Present in a nonorthologous copy In Coprinopsis cinerea. 
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expressed very highly later at 40 hpi during spore formation and 
was also expressed in isolated spore RNA. Perhaps NEPG_01813 
imports nucleotides from the host cell into the parasite during 
early stages of infection for rapid DNA replication, while 
NEPG_01539 imports ATP from the parasite cytoplasm into the 
mitosome during later stages of infection. 

Our Pfam analysis also identified a putative nucleoside H* 
symporter, which Is specific to Microsporidia (Table 2). Expression 
of this nucleoside transporter in N. parisii (ERTml) (NEPG_00699) 
can be detected throughout infection (Supplemental Table S5). 
This symporter is homologous to the Escherichia coU gene NupG, a 
broad-specificity transporter of purine and pyrimidine nucleosides 
that uses the proton motive force to drive nucleoside transport (Xie 
et al. 2004). Evolutionarily unrelated equilibratlve nucleoside 
transporters are used by eukaryotic parasites such as Trypanosoma 
species, which, like microsporidia, lack complete nucleoside syn- 
thesis pathways and need to import nucleosides from their hosts 
(Sanchez et al. 2002). However, unlike the equilibratlve nucleoside 
transporter family, which is widely distributed across eukaryotes 
and is absent from bacteria, the nucleoside H* symporter family is 
found primarily in bacteria, arthropods, and a small number of 
other eukaryotes (http://pfam.sanger.ac.uk/family/PF03825). The 
lack of any nucleoside H* symporter in basal fungi suggests that 
microsporidia acquired this transporter via horizontal gene trans- 
fer, although the exact source of the transfer is unclear. 

Notably, we found a specific loss of the retinoblastoma (RB) 
tumor suppressor gene in the Microsporidia phylum (Fig. 3; Table 
2). In most eukaryotes, the RB-E2F pathway regulates the cell cycle 
to promote cell proliferation and is composed of two activator 
proteins called E2F and DP, and a repressor protein called RB. The 
RB-E2F pathway is thought to have been present in the last com- 
mon ancestor of eukaryotes but has been lost In the Dikarya, 
having been replaced by an evolutionarily unrelated but func- 
tionally analogous cell cycle complex (Schaefer and Breeden 2004; 
Cao et al. 2010; Cross et al. 2011). In plants and animals, the E2F/ 
DP activators are normally kept in check by the inhibitor RB, 
which is encoded by a tumor-suppressor gene lost In most can- 
cers (Polager and Ginsberg 2009). Microsporidia encode E2F and 
DP proteins but lack recognizable RB proteins (through either 
orthology or translated BLAST against microsporidian genomes at 
a cutoff of 1 X 10^*), Indicating that Microsporidia have lost the RB 
inhibitor. 

In addition, we identified other domains and protein kinases 
shared between the Microsporidia and basal Fungi, as well as 
candidate Nematocida structural proteins Including spore wall 
proteins and polar tube proteins (Supplemental Table SIO; Sup- 
plemental Material). We also identified a large number of genes 
that are specific to the Nematocida genus, as we could not find 
orthologs of these genes in other microsporidia or in other fungi 
(Fig. 3). Only 11% of these Newatodda-speclfic genes could be 
assigned Pfam domains, leaving most without functional assign- 
ment. In particular, we found a very large Nematodda-spedfic gene 
family (Supplemental Fig. S7A) that was not similar to any other 
genes in GenBank but contained a conserved domain of 132 
amino acids (Supplemental Fig. S7B). Using a hidden Markov 
model (HMM) profile built for this domain, we identified 107 
members of this gene family in each of ERTml and ERTm3, and 
160 members in ERTm2, representing 4%-6% of the total pre- 
dicted gene content of Nematocida. Proteins in this Nematocida- 
specific family have an average size of 263 amino acids, and often 
contain predicted secretion signals; in particular, 31 proteins in 
this family are predicted to be secreted in ERTml, 26 in ERTm3, 



and 62 in ERTm2. Phylogenetic analysis showed a large number of 
species-specific expansions in this gene family. One large clade 
that is expanded within Nematocida is significantly enriched for 
genes predicted to be secreted (Supplemental Fig. S7A, lower half of 
tree) (p < 0.0001, test). Large families of secreted proteins are 
characteristic of virulence factors in other pathogens (Torto- 
Alalibo et al. 2010). 

Nematocida metabolism 

Microsporidia are notable for their extremely reduced metabolic 
capability. Similar to analyses of other microsporidia (Keeling et al. 
2010), we found that Nematocida genomes do not encode for 
components of an oxidative phosphorylation pathway or an intact 
tricarboxylic acid (TCA) cycle pathway, but do encode some core 
carbon pathways such as glycolysis (see Supplemental Material 
and below). In addition, seven of eight key pentose phosphate 
pathway enzymes are present in Nematocida genomes, but a 
transaldolase ortholog appears to be absent, as in other micro- 
sporidian genomes. However, we found that Nematocida and other 
microsporidia encode for Class 1 aldolase (also known as aldolase B) 
(Table 2) and may use it for glycolysis, in contrast to other fungi, 
which use Class 11 aldolase. Notably, the pentose phosphate 
pathway transaldolase is part of the Class I aldolase family within 
the aldolase superfamily (http://scop.mrc-lmb.cam.ac.uk/scop/ 
data/scop.b.d.b.bb.html) (Jia et al. 1996). Therefore, it is possible 
that microsporidia use Class I aldolase both in glycolysis and 
the pentose phosphate pathway, because this "double-duty" 
strategy would be an efficient use of the microsporidian enzy- 
matic repertoire. 

Nematocida and other microsporidia have very limited bio- 
sjmthetic and degradative capabilities, as compared with other 
fungi (see Supplemental Materled) but have retained a gene that 
encodes CTP synthase, which converts UTP into CTP, a nucleotide 
used for RNA/DNA synthesis and lipid synthesis (Chang and 
Carman 2008). CTP synthase is also present in the reduced genome 
of C. trachomatis, which encodes an active CTP synthase despite 
being able to Import CTP (Wylle et al. 1996). Perhaps rapidly rep- 
licating Intracellular pathogens need to supplement cytosine Im- 
port or balance nucleotide pools using CTP synthase. Alternatively, 
CTP synthase may also play a structural role, because it has been 
shown to form an intracellular filament structure conserved from 
bacteria to meimmals (Ingerson-Mahar et al. 2010; Noree et al. 2010). 

Secretion signal sequences in microsporidian hexokinases 

Hexokinase catalyzes the conversion of glucose to glucose-6- 
phosphate. Glucose-6-phosphate can then enter the glycolytic 
pathway or enter the pentose phosphate pathway (Fig. 4B). In our 
metabolic profiling, we noticed that hexokinase was the only 
glycolytic enzyme highly expressed at 8 hpi (Supplemental Fig. 
S8A; Supplemental Table S5). This early expression suggests that 
hexokinase could perform a function distinct from other glycolytic 
enzymes. We found that hexokinase genes in all three Nematocida 
genomes have a predicted secretion signal, while other enzymes in 
the glycolytic pathway do not (Supplemental Figs. S5, S8A). This 
observation suggests that Nematocida hexokinase could be secreted 
out of the parasite cell and into the host cell. 

To investigate whether hexokinase secretion into the host cell 
may be a general pathogenic strategy of Microsporidia, we exam- 
ined whether secretion signals are present in other microsporidia 
hexokinases. Indeed, of nine microsporidia genomes with anno- 
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Figure 4. Microsporidia hexol<inase secretion signals can direct protein 
trafficl<ing through the yeast secretion system. (A) A yeast secretion trap 
system was used to test the ability of microsporidia hexokinase secretion 
signals to direct secretion in a fungal system. Tenfold serial dilutions of 
5. cerevisiae strains containing candidate signal sequences were grown on 
media containing either sucrose (/eft) or glucose (right). Growth on su- 
crose indicates a functional signal sequence (boxed in red). (6) Model for 
evolutionary events that enable rapid intracellular growth of micro- 
sporidia: (1) hexokinase acquired a secretion signal sequence, likely 
directing this enzyme into the host cell, where it can increase production 
of amino acids, lipids, and nucleotides; (2) transporters were acquired, 
which can import host-synthesized nucleosides and nucleotides into the 
pathogen meront; and (3) RB was lost, which we speculate could lead to 
a rapid, but mistake-prone, cell cycle, with concomitant DNA and RNA 
synthesis supported by host-derived nucleotides. 



tated hexokinase genes, eight contain predicted secretion signal 
sequences (Supplemental Fig. S8B). Protein alignment of these 
hexokinases indicates that conservation of a secretion signal is not 
simply due to amino acid conservation in the amino terminus 
(Supplemental Fig. S8B) and thus may reflect functional conser- 
vation. We also analyzed hexokinases from the 12 other fungi in 
our analysis, and none of these enzymes had predicted secretion 
signal sequences. Thus, secreted hexokinase enzymes appear spe- 
cific to Microsporidia and may be a conserved adaptation of these 
intracellular parasites to their host cells. 

To determine experimentally whether these hexokinase 
secretion signal sequences could direct proteins through fungal 



secretory machinery, we tested these sequences in a yeast secretion 
trap system, which uses a simple growth assay to measure secretion 
in another fungus, Saccharomyces cerevisiae (Lee et al. 2006). Using 
this assay, the hexokinase secretion signal sequences from N. parisii 
(ERTml and ERTmS), as well as Nematocida spl (ERTm2), acted 
as functional secretion signal sequences (Fig. 4A; Supplemental 
Fig. S8C). Furthermore, we tested hexokinase secretion signal 
sequences from three other microsporidia species: E. cuniculi, 
Antonospora locustae, and N. ceranae. In all cases, we found that 
these secretion signals could traffic through the secretory pathway 
of S. cerevisiae. The negative control S. cerevisiae amino-terminal 
domain from either hexokinase 1 (HXKl) or HXK2 did not direct 
secretion, nor did the amino-terminal sequence from an N. parisii 
(ERTml) GTPase that is not predicted to be secreted, indicating 
specificity of this assay. In summary, our data suggest that during 
infection, microsporidia hexokinases are secreted outside of the 
pathogen into the host cell cjrtoplasm, where they would be ideally 
situated to boost the production of host metabolites that could be 
used by these pathogens for replication. 

Discussion 

Our analyses provide insights into events that may have facilitated 
the emergence of Microsporidia, the largest phylum of obligate 
intracellular parasites. First, we find that microsporidia hexoki- 
nases have been modified to include a functional secretion signal, 
which likely exports it into host cells (Fig. 4A,B). This finding could 
explain previous results with the microsporidian species Nosema 
gryllii, where activity of several glycolytic enzymes was detected in 
isolated pathogen cells, but hexokinase activity was distinctively 
absent (Dolgikh 2000). Furthermore, it has been reported that 
microsporidia infection causes depletion of host glycogen and 
rapid uptake of glucose (Metenier and Vivares 2001), events that 
could be promoted by delivery of the parasite hexokinase into host 
cells. Hexokinase catalyzes the first step in both glycolysis and the 
pentose phosphate pathway. Therefore, microsporidia hexokinase 
activity within host cells could increase host synthesis of building 
blocks, such as nucleotides, amino acids, and lipids, necessary for 
the rapid growth of these parasites (Fig. 4B). 

Our data show that during N. parisii infection, C. elegans in- 
testinal cells become completely consumed by parasites, as visu- 
alized in Figure ID. This figure highlights the conversion of host 
intestinal cells into parasite nucleotides, with parasite ribosomal 
RNA shown in red and parasite DNA in blue. Because N. parisii lacks 
almost all nucleotide biosynthetic capability (Supplemental Ma- 
terial), there is considerable demand on host metabolism to pro- 
vide these nucleotides. To meet these demands, microsporidia 
hexokinase would be ideally situated to redirect host cells toward 
anabolic metabolism (Fig. 4B). This host metabolic state shares 
similarities with the Warburg effect, where cancer cells switch to 
glycolysis to help meet their proliferative demands for nucleotides, 
amino acids, and lipids (Vander Heiden et al. 2009). Interestingly, 
the Warburg effect has been observed in host cells infected by 
Kaposi's sarcoma herpesvirus, another obligate intracellular path- 
ogen (Delgado et al. 2010). 

We speculate that an important step toward becoming a suc- 
cessful obligate intracellular pathogen is to increase the bio- 
synthetic output of the host. For Microsporidia, this step may have 
been the conversion of an amino-terminal hydrophobic domain 
found in hexokinase (John et al. 2011) into an amino-terminal 
hydrophobic secretion signal, which would intertwine the me- 
tabolisms of host and microbe. This marriage may have increased 
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the fitness and shaped the pathogenesis of microsporidia, while 
similar events in other host/microbe relationships may have led to 
symbiosis, or even the evolution of organelles such as mitochon- 
dria (Valdivia and Heitman 2007; Keeling 2011). 

Our phylogenomic analysis revealed that Microsporidia lost 
the RB tumor-suppressor gene (Fig. 3), which may explain the 
rapid intracellular growth of microsporidia within the host cell 
(Fig. 4B). RB is a key cell cycle inhibitor that stalls cells in the first 
gap phase (Gi) of the cell cycle, thus delaying entry into the syn- 
thesis (S) phase, when DNA replication occurs (Cao et al. 2010). In 
humans, loss of RB leads to an accelerated cell cycle and a dra- 
matically increased cancer risk. In fact, most cancer cells have lost 
the RB gene (Polager and Ginsberg 2009). Consistent with rapid 
entry into S phase, we find that N. partsii has cm estimated 2.9-h to 
3.3-h doubling time inside C. elegans host cells, quite similar to 
yeast growing in rich media. It is possible that microsporidia have 
adopted a cell cycle inhibitor unrelated in sequence to RB: Indeed, 
microsporidia must halt the cell cycle at some point, in order to 
differentiate into spores. However, we speculate that loss of cell 
cycle inhibition early during infection may have allowed micro- 
sporidia to rapidly take advantage of imported nucleotides (see 
below) in order to outpace host cells. In addition, it is interesting to 
consider that the loss of a Gi cell cycle constraint in response to 
DNA damage would result in an increased mutation rate, perhaps 
explaining the longer branch lengths of Microsporidia relative to 
other Fungi in a phylogenetic tree (Fig. 3). 

Microsporidian genomes encode unique nucleoside and nu- 
cleotide transporters that could draw host-synthesized nucleosides 
into the parasite cell (Fig. 4B). Our phylogenomics studies identi- 
fied Npt nucleotide transporters, as previously shown to be present 
in the microsporidian E. cunkuli, as well as the obligate bacterium 
C. trachomatis. Furthermore, we identified a new class of nucleo- 
side transporters in the microsporidia, which we propose were 
acquired by horizontal gene transfer (Table 2). Transporters like 
these, as well as other microsporidia-specific genes, are candidate 
drug targets for both medical and agricultural purposes. 

We also Identified a OTP synthetase as the only nucleotide 
synthesis gene retained in the highly reduced genomes of the 
Microsporidia. This gene is also conserved in the reduced genome 
of the bacterium C. trachomatis, and these retentions suggest that it 
may perform a distinct function in addition to biosynthesis. OTP 
synthase has recently been proposed to constitute a cytoskeletal 
system, because it forms filaments in diverse organisms and regu- 
lates cell shape (Ingerson-Mahar et al. 2010; Noree et al. 2010). 
Maintenance of enzymes that serve multiple functions such as 
CTP synthase and perhaps Class I aldolase would be consistent 
with the overall economy of these highly reduced genomes. No- 
tably, retention of CTP synthase as well as acquisition of trans- 
porters are features in common between the reduced prokaryotic 
genome of C. trachomatis and the reduced eukaryotic genomes of 
the Microsporidia, suggesting convergent evolutionary strategies 
between these unrelated obligate intracellular pathogens. Further- 
more, up-regulation of CTP synthase activity has been observed in 
tumors and leukemic cells (Williams et al. 1978; Kizaki et al. 1980; 
Weber et al. 1980), again highlighting the similarities between ob- 
ligate intracellular pathogens and cancer cells. 

Flncilly, microsporidia may generate genetic diversity by un- 
dergoing a rare sexual cycle and recombination, or by mitotic re- 
combination resulting in loss of heterozygosity, because we found 
evidence to support this in Nematocida. Previous studies in other 
species of microsporidia have suggested the presence of a sexual 
cycle (Becnel et cil. 2005). Notably, many species of microsporidia 



are dikaryotic (i.e., have two nuclei paired in one cell). The life 
cycle of N. parisii is less well-characterized than that of other spe- 
cies, but it involves meronts of at least two distinct types that 
contain between one and several nuclei within one cell (Troemel 
et al. 2008). These nuclei are often unpaired (monokaryotic), al- 
though two nuclei in close proximity are occasionally observed, 
and might represent the paired nuclei of a dikaryotic stage. Perhaps 
these nuclei fuse as part of a conventional sexual cycle that un- 
dergoes reductive division by meiosis or as part of a parasexual 
cycle that undergoes reductive division by chromosomal loss, as 
observed in the fungal pathogen Candida albicans (Bennett and 
Johnson 2003). Our data provide the first molecular support for 
mating and recombination occurring in microsporidia, and we 
speculate that it may cillow them to repair mutations, or alterna- 
tively to propagate mutations and increase virulence, similar to 
other fungal pathogens (Ni et al. 2011). In summary, we propose 
that rapid clonal proliferation using the cancer-like strategies de- 
scribed above, together with a rare sexual cycle, may complement 
each other to allow for rapid intracellular growth and successful 
spread of microsporidia throughout the animal kingdom. 

Methods 

Nematocida genomic DNA isolation 

All three Nematocida strains (N. parisii strains ERTml and ERTm3, 
and Nematocida spl strain ERTm2) were transferred into the 
C. elegans vnld-type N2 strain for propagation and harvest (Troemel 
et al. 2008). See the Supplemental Methods for details on harvest 
and DNA extraction. 

Nematocida RNA isolation and RNA-seq library construction 

C. elegans growth and infections were performed at 25°C, and 
N. parisii spores were prepared as previously described (Estes et al. 
2011). The C. elegans temperature-sensitive sterile strain fer- 
lS(b26);fem-l(hcl7) was used to prevent internal hatching of 
progeny during the progression of infection. Synchronized fer- 
15;fem-l Lis were added to 10-cm NGM plates seeded with a lawn 
of E. coli OP50 bacteria, grown for 24 h at the restrictive tempera- 
ture of 25°C, and then 3 x 10'' N. parisii ERTml spores diluted in 
M9 media (or M9 media alone for uninfected controls) were spread 
over the entire surface of these plates. Infected and control animals 
were harvested at the appropriate times, and total RNA was 
extracted using TRl Reagent (Molecular Research Center, Inc). See 
the Supplemental Methods for details on assessing life cycle and 
infection kinetics using microscopy, RNA quality analysis, and 
RNA isolation from spores. Strand-specific libraries were con- 
structed for all RNA samples using the dUTP second strand mark- 
ing method (Parkhomchuk et al. 2009; Levin et al. 2010). For de- 
tailed protocols, see the Supplemental Methods. 

Genome sequencing, assembly, and single nucleotide 
polymorphisms (SNPs) 

DNA for each genome was sequenced using 454 FLX-Titanium 
technology for two whole-genome shotgun libraries, 400-base 
fragment, and 2.7-kb paired-end reads. Reads were screened by 
BLAST against the nonredundant GenBank nucleotide database to 
identify possible contamination; of 1000 reads examined for each 
species, <2% matched the E. coli genome and <0.1% matched 
the C. elegans genome. 454 reads were assembled with Newbler 
(MapAsmResearch-03/15/2010). A very high fraction (>95%) of 
reads were assembled. Contamination with C. elegans and E. coli 
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was removed using Newbler's -vs option. Assemblies were de- 
posited In GenBank for N. parisii ERTml (AEFFOIOOOOOO), N. parisii 
ERTmS (AEOOOIOOOOOO), and Nermtacida spl (AERBOIOOOOOO). 
For further details, see the Supplemental Methods. 

lUumina sequence for two ERTm2 samples was used to iden- 
tify polymorphisms using GATK. We generated 101-bp reads for 
the same sample used to generate the genome sequence using 454 
data, and for a second sample serially passaged for 14 wk (as de- 
scribed above). For further details, see the Supplemental Methods. 

Gene prediction and RNA-seq analysis 

For ERTml and ERTm2, an initial set of proteln-codlng genes was 
predicted using Prodigal (Hyatt et al. 2010). For ERTmS, the initial 
gene set was generated by a synteny-based transfer of the gene set 
from ERTml; transferred genes with internal frameshifts were 
replaced by Prodigal predictions for that locus. RNA-seq was used 
to refine the gene predictions for ERTml (Supplemental Methods). 
To calculate the coverage of all RNA-seq data (including spores) 
after we had ascertained there were no spliced genes, we realigned 
all sequences to the ERTml genome using BWA (LI and Durbin 
2009) and then calculated fragments per kllobase of transcript per 
million fragments mapped (FPKM) (Trapnell et al. 2010). For fur- 
ther details, see the Supplemental Methods. 

Gene clustering and evolutionary analysis 

Orthologous gene families were identified for the three Nematocida, 
seven other microsporidia, 12 fungi, and one choanoflagellate 
outgroup (see Supplemental Methods) using ORTHOMCL version 
1.4 with a Markov Inflation index of 1.5 and a maximum e-value of 

1 X 10"^. To estimate a phylogeny, we selected 53 orthologs 
present as single copy in all genomes. Amino acid sequences of 
these clusters were aligned using MUSCLE (Edgar 2004), and 
poorly aligned regions were trimmed using trimAl under default 
settings (Capella-Gutierrez et al. 2009). We then estimated a phy- 
logeny using the PROTGAMMABLOSUM62 model in RAxML 
(Stamatakls 2006) with 1000 bootstrap replicates. We compared 
this multl-ortholog-based phylogeny with phylogenies with 
Microsporidia repositioned to be (1) sister group to Rhizopus, and (2) 
sister group to the chytrids, with the approximately unibiased test 
(Shimodaira 2002) as implemented in CONSEL (Shimodaira and 
Hasegawa 2001), using the single-copy core amino acid alignments 
used to generate the organismal phylogeny. For further details, see 
the Supplemental Methods. 

Yeast secretion trap analysis of candidate signal sequences 

To test whether predicted signal sequences would direct trafficking 
through the S. cerevisiae secretory pathway, we used a yeast secre- 
tion trap system as described before (Lee et al. 2006). Briefly, we 
PCR-amplified the first 85 amino acids from candidate genes and 
fused them at their C termini to the S. cerevisiae invertase reporter 
gene using vector pYST-1 (a kind gift from Jocelyn Rose). The DNA 
templates for PGR amplification were extracted from spores 
(N. ceranae spores were a kind gift of James Nleh and Guntlma 
Suwannapong, E. cuniculi spores were a kind gift of Quanshun 
Zhang and Saul Tzlporl, and A. locustae spores were purchased from 
M&R Durango). The resulting fusions were transformed Into an 
Invertase deletion strain of S. cerevisiae SEY6210 (ATGG96099) and 
the transformants plated onto synthetic leucine drop-out medium 
with 2 (jLg/mL antimycin A (Sigma- Aldrich A8674) and either glucose 
or sucrose as the sole carbon source. Secretion of the invertase fusion 
protein directed by the amino-terminal candidate gene sequence 
rescues the deletion mutant and results In yeast growth on sucrose. 



Data access 

Assemblies and annotations were submitted to GenBank (http:// 
www.ncbl.nlm.nih.gov/genbank) under the following accession 
numbers: N. parisii ERTml (AEFFOIOOOOOO), N. parisii ERTmS 
(AEOOOIOOOOOO), and N. spl ERTm2 (AERBOIOOOOOO). SNP data 
were submitted to dbSNP (www.ncbl.nlm.nlh.gov/projects/SNP/) 
under handle BROAD-GENOMEBIO and scheduled for release in 
Build 136 (ERTm2: sbid 1057048; ERTml: sbid 1057268; ERTmS: 
sbid 1057267 and 1057269). All genomes, gene sets, and SNPs for 
this project are publicly available on the Broad website (http:// 
www.broadinstitute.org/annotatlon/genome/microsporidia_ 
comparative). 
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